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METHOD AND SYSTEM FOR 
PROVIDING SPATIALIZED AUDIO 
IN CONFERENCE CALLS 

5 BACKGROUND OF THE D^ENTION 
Technical Field of the Invention 

The present invention relates generally to the telecommunications field and, 
in particular, to a method and system for providing spatialized audio in conference 
calls. 

10 Description of Related Art 

Conference calls are becoming an increasingly common conmiunications 
medium. For example, a large corporation can have offices located throughout the 
world, but the corporation's employees at different locations are often required to 
consult with each other by conference call, in order to develop conclusions and 

15 solutions for pressing problems. Furthermore, the younger generation's current use 
of Internet chat rooms for "text-chats" will likely extend that practice to "voice- 
chats" (i.e., conference calls). 

The conventional conference call systems in use today utilize a single voice 
channel for all participants, and a moderator typically controls the conference calls. 

20 As such, an individual can participate in only one conference call at a time. 

A number of significant problems exist with the existing approaches taken 
for making conference calls. For example, during a conference call, it is often 
difficult to recognize who is speaking by voice alone. This recognition problem 
can be exacerbated if there are several participants in the conference call with 

2 5 similar regional accents or voices that sound similar. Furthermore, two or more 

conference call participants can be speaking at the same time, which degrades the 
conversations. Another problem with existing conference call approaches is that 
they do not make it possible to divide a conference call into a number of sub- 
conferences, and also to allow participants to move freely between the sub- 

3 0 conferences. Yet another problem with existing conference call approaches is that 
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10 



15 



20 



a moderator needs to be appointed whenever the number of participants exceeds a 
certain limit. However, as described in detail below, the present invention 
successfully resolves the above-described problems. 

SUMMARY OF THE INVENTION 

In accordance with the present invention, a method is provided for 
spatializing audio in conference calls, in which the participants in the calls are 
placed in particular locations in order to provide an additional dimension 
(direction) so that the participants can better recognize who is speaking. Also, the 
dimension of volume is provided, which can be used in creating backgroxmd sub- 
conferences. Consequently, sub-conferences can be conducted in which the 
participants can move freely between them and also listen to other sub-conferences 
being conducted in the background. 

An important technical advantage of the present invention is that by using a 
spatial layout for a conference call, the audio streams from different sub- 
conferences are distinguishable, and a user is then able to attend multiple sub- 
conferences. 

Another important technical advantage of the present invention is that by 
using a spatial layout for a conference call, one participant will always know which 
other participant is speaking, as long as the first participant knows the spatial 
location of the other participant. 

Yet another important technical advantage of the present invention is that 
the use of a spatial layout for conference calls can be used as an aid for auditory 
memory. This c^proach can be useful when a participant is located in a noisy 
environment and it is difficult to recognize a speaker by voice alone, or one person 
participates in a conference with other people who are unknown and it is difficult to 
recognize the speaker by voice alone. 



BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the method and apparatus of the present 
invention may be had by reference to the following detailed description when taken 



wo 00/48379 PCT/SEOO/00204 




-3- 



in conjunction with the accompanying drawings wherein: 

FIGURE 1 A is a diagram that illustrates a centralized conference call 

system that can be used to provide spatialized audio, in accordance with a preferred 

embodiment of the present invention; 
5 FIGURE IB is a diagram that illustrates a distributed conference call 

system that can be used to provide spatialized audio, in accordance with the 

preferred embodiment of the present invention; 

FIGURE 2 is a flow diagram of an exemplary method that can be used by a 

user of a terminal to register with a conference, in accordance with the preferred 
1 0 embodiment of the present invention; 

FIGURES 3A and 3B are related diagrams that illustrate a spatial layout for 

a conference call, in accordance with the preferred embodiment of the present ' 

invention; and 

FIGURES 4A and 4B are related diagrams that illustrate a spatial layout for 
15 a plurality of sub-conferences, in accordance with the preferred embodiment of the 
present invention. 

DETAILED DESCRIPTION OF THE DRAWINGS 

The preferred embodiment of the present invention and its advantages are best 
understood by referring to FIGURES 1 -4 of the drawings, like numerals being used for 

2 0 like and corresponding parts of the various drawings. 

Essentially, in accordance with the present invention, a method for spatiaHzing 
audio in conference calls is provided in which the participants in the calls are placed 
in particular locations in order to provide an additional dimension (direction) so that 
the participants can better recognize who is speaking. Also, the dimension of volume 
25 is provided, which can be used in creating background sub-conferences. 
Consequently, sub-conferences can be conducted in which the participants can move 
seamlessly between them and also listen to other sub-conferences being conducted in 
the background. 

Specifically, as illustrated by the diagrams shown in the related FIGURES 1 A 

3 0 and IB, there are two types of conference call systems that can provide spatiaUzed 
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audio: centralized and distributed systems. In a centralized system (e.g., as shown in 
FIGURE 1 A), the spatialization process (i.e., process of giving a direction to the 
source of the audio stream) takes place at a central location. In a distributed system 
(e.g., as shown in FIGURE IB), the spatialization process takes place at each terminal 
5 involved in the conference call. Notwithstanding the type of conference call system 
involved, in accordance with a preferred embodiment of the present invention, each 
participant in the conference call preferably wears a stereo headset or similar apparatus 
which is fed by two relatively high-quality audio channels (e.g., > 20 kHz). 
Consequently, in the preferred embodiment, the conference calls can be implemented 

1 0 with 3-dimensional spatialization. 

Referring to the exemplary centralized system 100 sho.wn in FIGURE lA, a 
plurality of conference call participants are using terminals (e.g., telephones with 
headsets) 102, 104, 106, which are connected to a network 108. For example, the 
exemplary network 108 can be a Public Switched Telephone Network (PSTN), a 

1 5 Public Land Mobile Network (PLMN), or the Intemet, and the telephones ( 1 02- 1 06) 
can be fixed telephones, mobile radiotelephones, or Personal Computers (PC's), 
respectively. 

For this embodiment, the system 100 also includes a conference call control 
unit 110 (e.g., part of a server or similar processing unit in the network) connected to 

20 a plurality of command units 1 12a-n and spatialization units 1 14a-n. The command 
units 1 12 and spatialization units 1 14 are further connected to a conunon audio bus 
116. More precisely, each command unit 112 can output an audio signal to a 
conductor in the audio bus 1 16, and a control (command) signal to the control unit 
1 10. Each spatialization unit 1 14 can receive a plurality of audio signals fi-om the 

25 conductors of the audio bus 116, and output a spatialized audio signal which is 
coupled to the network 108 and then to the terminals 102-106. As such, the control 
unit, command units, spatialization units, etc., can be analog or digital units. 

In operation, a user of a terminal (102, 104, 106, etc.) can send a voice 
command or other type of control signal (e.g., DTMF tone) to an assigned command 

3 0 unit (1 12) via an audio connection. For example, in order for the command unit to 
distinguish between a voice conunand and ordinary speech, the xiser can initially 
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vocalize a predetermined keyword which is unique and unlikely to be interpreted as 
ordinary speech. A recognition unit associated with the command unit can recognize 
the spoken keyword as a voice command. AUematively, for example, the user can 
press a button (or key) that sends a unique DTMF code to the command unit. The 
5 DTMF code can be recognized by the associated recognition unit as an instruction to 
interpret subsequent vocalized words as commands. As another example, a user-to- 
user signalling method of issuing commands can be used. For example, a signalling 
channel can be used to send a command signal to the command unit. Such signalling 
channels are available, for example, in ISDN networks and also in mobile 

1 0 communications networks (e.g., in the GSM). As such, in response, the command unit 
112 routes the control signal to the control unit 110, and couples any audio signals to 
a conductor (channel) in the audio bus 116. The control signal instructs the control 
unit 110 about the user's preferences as to a spatial layout (e.g., formation of a sub- 
conference, etc.). In response to the user's commands, the control unit 1 10 sends 

1 5 spatial layout commands to the spatialization units 1 14, which combine the plurality 
of audio signals received from the audio bus 1 16 so as to configure spatial layouts in 
accordance with the users' preferences. The resulting spatialized audio signals are 
then coupled to the users' terminals via the network 108. 

More specifically, FIGURE 2 is a flow diagram of an exemplary method 1 50 

2 0 that can be used by a user of a terminal to register with a conference, in accordance 

with the preferred embodiment of the present invention. At step 152, the user (of a 
terminal 102, etc.) calls a telephone number associated with the intended conference, 
and at step 1 54, the call is routed (via the network 1 08) to a command unit (e.g., 1 1 2a). 
At step 1 56, the user enters a dialog with the command unit 1 12a. During this dialog, 
25 the command unit 1 12a interrogates the user to determine, for example, the user's 
name, what conference the user intends to be connected to as an active participant, and 
what conference(s) should remain in the background, etc. At step 158, the control unit 
110 connects the user to at least one conductor (channel) of the audio bus 116, and 
thus determines the relative position of the user in the conference. 

3 0 Referring to the exemplary distributed system 200 shown in FIGURE IB, a 

plurality of conference call participants are using terminals (e.g., telephones with 
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headsets) 202, 204, 206, 208, which are connected to a network 210. Again, for this 
exemplary embodiment, the network 210 can be a PLMN or the Internet, and the 
telephones (202-208) can be fixed telephones, mobile radiotelephones or PC's, 
respectively. Altematively, the network 210 can be a PSTN, which is technically 
5 possible but less likely to be used in reality. Forthis embodiment, each of the 

terminals 202-208 can be configured to include a control unit (e.g., 212) and a 
spatialization unit (e.g., 214). In other words, the audio spatialization in this 
distributed conference call system 200 is preferably accomplished at the terminals 
involved in the conference call. Each terminal 202-208 outputs an audio signal (e.g., 
10 originating fi^om a microphone for the respective user), which is coupled via the 
network 210 to a spatialization unit in each of the other terminals involved in the 
conference call. 

In operation (referring to the distributed system shown in FIGURE IB), with 
a voice command or other command signal (DTMF tone), a terminal's user (e.g., for 

1 5 terminal 202) inputs a control signal to the control unit (e.g., 212) in that terminal. For 
example, as described earlier with respect to FIGURE 1 A, the user's command can be 
a unique keyword or DTMF code. Also as described earlier, a user-to-user signallmg 
method of issuing commands can be used (e.g., using a signalling channel to convey 
a user's commands). The resulting control signal instructs the control unit 212 about 

20 the user's preferences as to a spatial layout (e.g., create a sub-conference, etc.). In 
response to the user's conunand, the control unit 212 sends a spatial layout command 
signal to the spatialization unit 214, which combines the plurality of audio signals 
received fi"om the other terminals involved in the call, so as to configure a spatial 
layout in accordance with the user's preference. The resulting spatialized audio 

2 5 signals are then coupled to the user's (stereo) headset. 

In accordance with the present invention, a user can identify a plurality of sub- 
conferences by their relative spatial locations. The user can then select one of those 
conferences for active participation. For example, a list of identifiers associated with 
the sub-conferences and information about their relative positions can be displayed by 

30 a terminal (e.g., using a PC via the Internet). The user ttien selects fi-om the list in 
order to participate in one or more of the sub-conferences. Alternatively, for example, 
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the user can initiate a procedure that browses the audio channels, whereby the system 
couples the user to each of a succession of different sub-conferences (e.g., brings each 
of the sub-conferences to the foreground in succession). As such, for example, during 
a certain period of time, the user can select one (or more) of those sub-conferences for 
5 active participation (e.g., by sending a DTMF signal or voice command to the control 
unit 110 or 212). 

FIGURES 3 A and 3B are related diagrams that illustrate a spatial layout for a 
conference call, in accordance with the preferred embodiment of the present invention. 
Essentially, a round table approach can be used for selecting the position of the 

10 participants involved in the conference call. In other words, for this embodiment, 
there is a consistent left-to-right order used for the participants involved. As such, 
referring to FIGURE 3A, fiom a first participant's (A) point of view with respect to 
the spatial layout of the conference, that participant (A) is located in the center of a 
circle, and the other participants (B-E) are located in the half circle in "front" of the 

15 first participant (A). Similarly, referring to FIGURE 3B, from participant E's point 
of view (maintaining the left-to-right order), that participant (E) is located in the center 
of the circle, while the other participants (A-D) are in "fix)nt" of participant E. Such 
a layout is preferable, because people prefer to hold conversations with other people 
who are in front of them rather than behind them. Nevertheless, although a left-to- 

2 0 right order is used for the preferred embodiment, the invention is not intended to be 
so limited, and in a different embodiment, a right-to-left order can be used. 

FIGURES 4A and 4B are related diagrams that illustrate a spatial layout for a 
plurality of sub-conferences, in accordance with the preferred embodiment of the 
present invention. Referring to FIGURE 4A, a plurality of sub-conferences 302, 304, 

25 306, 308, 3 1 0 are shown for an exemplary spatial layout. The sub-conferences can be 
created by a user sending a command signal (e.g., for voice recognition or DTMF 
tone) to the appropriate control unit (1 10 or 212) shown in FIGURE 1 A or IB. If a 
user desires to leave a current sub-conference, the user can send a "leave" command 
to the appropriate control unit. That user (e.g., user x in FIGURE 4A) is moved to the 

30 outer circle where all other sub-conferences are located in the spatial layout. As 
shown, the user x is not participating in any conference or sub-conference, and users 
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A-E are participating in the same sub-conference 302. 

In order to participate in another sub-conference, a user can send an "approach 
<argument>" command to the appropriate control unit. The "argument" is the 
identifier of one of the participants in that other sub-conference. The user will be 
5 placed in "front" of the participant identified in the "argument" in the spatial layout of 
that sub-conference, and then the user can begin to participate in that sub-conference. 
Notably, the user can also participate in another sub-conference without having to send 
a "leave" command to the control unit. Note that all of the sub-conferences are located 
at a relatively large spatial distance to any other sub-conference. Consequently, a 

1 0 participant in one sub-conference will hear all other participants in that sub-conference 
in the foreground, and the participants in other sub-conferences in the background. As 
illustrated by FIGURE 4B, if a user (A) does not participate in any sub-conference, 
that user can be spatially located as shown. As such, that user (A) can listen to all of 
the sub-conferences (304-3 12) simultaneously, and can join and participate in one of 

1 5 the sub-conferences as desired. 

In accordance with the present invention, by using such sub-conferences in a 
spatial layout as shown, there is less need to appoint a conference moderator 
(nevertheless, moderators will still likely be used for the sub-conferences). In any 
event, the participants can create sub-conferences themselves if so desired, and move 

20 between the sub-conferences freely. As such, using the present invention's spatial 
layout approach, the conference/sub-conference participants can be in effect their own 
moderators. Furthermore, by using such spatial layouts, users can attend multiple 
conferences as long as the audio stream from the conferences are distinguishable. In 
this regard, human beings are capable of monitoring numerous conversations 

2 5 simultaneously, and can focus on any one of the conversations while placing the other 
conversations in the background. This cognitive phenomenon, which the present 
invention preferably takes advantage of, is the so-called "Cocktail Party Effect". In 
other words, using a stereo headset along with the present invention's spatial layout 
approach, for example, a conference/sub-conference participant can distinguish audio 

30 streams firom other participants due to their different spatial locations. The audio 
streams can appear to be coming firom different directions (e.g., &om different 
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locations in a sub-conference), or to be originating at different distances (e.g., from 
other sub-conferences). 

In order to create stereo sound in the preferred embodiment, the system can use 
pairs of audio channels. For example, in a fixed pubhc network, an ISDN connection 
5 can provide two channels that can be used to provide stereo sound. Similarly, for 
example, a wideband cellular system can assign two channels for each such stereo 
connection to be used. 

Although a preferred embodiment of the method and apparatus of the present 
invention has been illustrated in the accompanying Drawings and described in the 
1 0 foregoing Detailed Description, it will be understood that the invration is not limited 
to the embodiment disclosed, but is capable of nmnerous rearrangements, 
modifications and substitutions without departing fh)m the spirit of the invention as 
set forth and defined by the following claims. 
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WHAT IS CLAIMED IS: 

1 . A system for distinguishing between participants in a conference call, 
comprising: 

a plurality of terminals, each one of said plurality of terminals configured to 
5 send an audio signal and receive a spatial audio signal, said plurality of audio signals 
creating said conference call; 

a control unit coupled to said plurality of terminals, said control unit 
configured to output a spatial layout signal responsive to at least one command signal 
fi-om said plurality of terminals; and 
10 a plurality of audio spatialization units, each one of said plurality of audio 

spatialization units coupled to said control unit and said plurality of terminals, at least 
one of said plurality of audio spatialization units configured to output said spatial 
audio signal responsive to said spatial layout signal. 

2. The system of Claim 1, wherein said at least one command signal 
15 includes a voice command associated with said spatial layout signal, 

3. The system of Claim 1, wherein said at least one command signal 
includes a DTMF tone associated with said spatial layout signal. 

4. The system of Claim 2 or 3, wherein said at least one command signal 
includes a data signal transmitted over a signalling chaimel. 

20 5. The system of Claim 1 , further comprising a command recognition unit 

coupled to at least one of said plurality of terminals and said control unit. 

6. The system of Claim 1 , wherein each one of said plurality of terminals 
is coupled to said control unit and said at least one of said plurality of audio 
spatiahzation units by a telecommunications network. 

25 7. The system of Claim 6, wherein said telecommunications network 
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comprises a PSTN. 



8. The system of Claim 6, wherein said telecommunications network 
comprises a PLMN. 

9. The system of Claim 6, wherein said telecommunications network 
5 comprises an Internet. 

10. The system of Claim 5, wherein said command recognition unit 
includes a voice recognition circuit. 

11. The system of Claim 5, wherein said command recognition unit 
includes a tone recognition circuit. 

10 12. The system of Claim 5, wherein said command recognition unit 

includes data signalling receiving circuitry. 

13. The system of Claim 1, wherein said spatial layout signal determines 
a position for each of said participants in said conference call. 

14. The system of Claim 1, wherein said spatial layout signal determines 
15 a position for at least one of said participants outside of said conference call. 

1 5 . The system of Claim 1 , wherein said spatial layout signal comprises a 
left-to-right order for said participants in said conference call. 

1 6. The system of Claim 1 , wherein said plurality of terminals, said control 
unit, and said plurality of audio spatiaUzation units comprise analog terminals and 

20 units. 



17. The system of Claim 1 , wherein said plurality of terminals, said control 
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unit, and said plurality of audio spatialization units comprise digital terminals and 
units. 

18. A terminal for use in distinguishing between participants in a 
conference call, comprising: 

5 a control unit configured to output a spatial layout signal responsive to a first 

participant's preference; and 

an audio spatialization unit coupled to said control unit, said audio 
spatialization unit configured to receive a plurality of audio signals fi-om terminals 
associated with other participants, and output a spatial audio signal comprising said 
1 0 plurality of audio signals arranged in response to said spatial layout signal. 

19. The terminal of Claim 18, wherein said audio spatialization unit is 
coupled to said terminals associated with said other participants by a 
telecommunications network. 

15 20. The terminal of Claim 19, wherein said telecommunications network 

comprises a PSTN. 

21 . The terminal of Claim 19, wherein said telecommunications network 
comprises a PLMN. 

22. The terminal of Claim 19, wherein said telecommunications network 
2 0 comprises an Intemet. 

23. The terminal of Claim 18, fiirther comprising an analog terminal. 

24. The terminal of Claim 1 8, fiirther comprising a digital terminal. 



25. A method for distinguishing between participants in a conference call, 
comprising: 

2 5 each one of a plurality of temiinals outputting an audio signal, said plurality 
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of audio signals creating said conference call; 

generating a spatial layout signal responsive to at least one command signal; 

and 

generating a spatial audio signal responsive to said spatial layout signal, said 
spatial audio signal comprising said plurality of audio signals. 
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